Search CORE

6 research outputs found

Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence

Author: Laradji Issam
Nutini Julie
Schmidt Mark
Publication venue
Publication date: 23/12/2017
Field of study

Block coordinate descent (BCD) methods are widely-used for large-scale numerical optimization because of their cheap iteration costs, low memory requirements, amenability to parallelization, and ability to exploit problem structure. Three main algorithmic choices influence the performance of BCD methods: the block partitioning strategy, the block selection rule, and the block update rule. In this paper we explore all three of these building blocks and propose variations for each that can lead to significantly faster BCD methods. We (i) propose new greedy block-selection strategies that guarantee more progress per iteration than the Gauss-Southwell rule; (ii) explore practical issues like how to implement the new rules when using "variable" blocks; (iii) explore the use of message-passing to compute matrix or Newton updates efficiently on huge blocks for problems with a sparse dependency between variables; and (iv) consider optimal active manifold identification, which leads to bounds on the "active set complexity" of BCD methods and leads to superlinear convergence for certain problems with sparse solutions (and in some cases finite termination at an optimal solution). We support all of our findings with numerical results for the classic machine learning problems of least squares, logistic regression, multi-class logistic regression, label propagation, and L1-regularization

arXiv.org e-Print Archive

Greed is good : greedy optimization methods for large-scale structured problems

Author: Nutini Julie
Publication venue: University of British Columbia Press
Publication date: 01/09/2018
Field of study

This work looks at large-scale machine learning, with a particular focus on greedy methods. A recent trend caused by big datasets is to use optimization methods that have a cheap iteration cost. In this category are (block) coordinate descent and Kaczmarz methods, as the updates of these methods only rely on a reduced subspace of the problem at each iteration. Prior to our work, the literature cast greedy variations of these methods as computationally expensive with comparable convergence rates to randomized versions. In this dissertation, we show that greed is good. Specifically, we show that greedy coordinate descent and Kaczmarz methods have efficient implementations and can be faster than their randomized counterparts for certain common problem structures in machine learning. We show linear convergence for greedy (block) coordinate descent methods under a revived relaxation of strong convexity from 1963, which we call the Polyak-Lojasiewicz (PL) inequality. Of the proposed relaxations of strong convexity in the recent literature, we show that the PL inequality is the weakest condition that still ensures a global minimum. Further, we highlight the exploitable flexibility in block coordinate descent methods, not only in the different types of selection rules possible, but also in the types of updates we can use. We show that using second-order or exact updates with greedy block coordinate descent methods can lead to superlinear or finite convergence (respectively) for popular machine learning problems. Finally, we introduce the notion of “active-set complexity”, which we define as the number of iterations required before an algorithm is guaranteed to reach the optimal active manifold, and show explicit bounds for two common problem instances when using the proximal gradient or the proximal coordinate descent method.Science, Faculty ofComputer Science, Department ofGraduat

University of British Columbia: cIRcle - UBC's Information Repository

A derivative-free approximate gradient sampling algorithm for finite minimax problems

Author: Nutini Julie Ann
Publication venue: University of British Columbia Press
Publication date: 01/05/2012
Field of study

Mathematical optimization is the process of minimizing (or maximizing) a function. An algorithm is used to optimize a function when the minimum cannot be found by hand, or finding the minimum by hand is inefficient. The minimum of a function is a critical point and corresponds to a gradient (derivative) of 0. Thus, optimization algorithms commonly require gradient calculations. When gradient information of the objective function is unavailable, unreliable or ‘expensive’ in terms of computation time, a derivative-free optimization algorithm is ideal. As the name suggests, derivative-free optimization algorithms do not require gradient calculations. In this thesis, we present a derivative-free optimization algorithm for finite minimax problems. Structurally, a finite minimax problem minimizes the maximum taken over a finite set of functions. We focus on the finite minimax problem due to its frequent appearance in real-world applications. We present convergence results for a regular and a robust version of our algorithm, showing in both cases that either the function is unbounded below (the minimum is −∞) or we have found a critical point. Theoretical results are explored for stopping conditions. Additionally, theoretical and numerical results are presented for three examples of approximate gradients that can be used in our algorithm: the simplex gradient, the centered simplex gradient and the Gupal estimate of the gradient of the Steklov averaged function. A performance comparison is made between the regular and robust algorithm, the three approximate gradients, and the regular and robust stopping conditions. Finally, an application in seismic retrofitting is discussed.Arts and Sciences, Irving K. Barber School of (Okanagan)Computer Science, Mathematics, Physics and Statistics, Department of (Okanagan)Graduat

University of British Columbia: cIRcle - UBC's Information Repository

“Active-set complexity” of proximal gradient: How long does it take to find the sparsity pattern?

Author: A Beck
C Buchheim
C Cortes
D Krishnan
DP Bertsekas
DP Bertsekas
DP Bertsekas
ES Levitin
FE Curtis
J Liang
Julie Nutini
JV Burke
M Santis De
Mark Schmidt
PL Combettes
R Mifflin
R Tibshirani
S Boyd
S Lee
S Solntsev
SJ Wright
SJ Wright
Warren Hare
WL Hare
WL Hare
Y Nesterov
Y Nesterov
Z Wen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref